Extract shared export engine; wire CLI to summary cache and drop mypy…#112
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughA new ChangesExport Orchestration Consolidation and Refactoring
Sequence Diagram(s)sequenceDiagram
participant User as CLI/API Caller
participant Script as scripts/export.py or api/export_api.py
participant Engine as collect_export_entries
participant IDEExp as _collect_ide_export_entries
participant CLIExp as _collect_cli_export_entries
participant Filter as Exclusion Filter
User->>Script: request export (since, include_composer, include_cli)
Script->>Script: resolve workspace_path, load exclusion_rules, compute last_export_ms
Script->>Engine: collect_export_entries(workspace_path, rules, since, last_export_ms)
Engine->>Engine: prepare_workspace_orchestration(fingerprint for caching)
alt include_composer=True
Engine->>IDEExp: load_global_db_export_data(orch)
IDEExp-->>Engine: GlobalDbExportData with composer rows
Engine->>IDEExp: _collect_ide_export_entries(orch, global_db_data, since, last_export_ms)
IDEExp->>IDEExp: iterate rows → parse JSON → filter by since
IDEExp->>Filter: build bubble/model/config/diff text + apply rules
Filter-->>Engine: CollectedExportEntry[]
end
alt include_cli=True
Engine->>CLIExp: _collect_cli_export_entries(orch, since, last_export_ms)
CLIExp->>CLIExp: enumerate projects/sessions → traverse blobs → build text
CLIExp->>Filter: apply exclusion rules
Filter-->>Engine: CollectedExportEntry[]
end
Engine-->>Script: combined list[CollectedExportEntry]
Script->>Script: write zip or markdown + update manifests
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
services/export_engine.py (1)
91-92: 📐 Maintainability & Code Quality | 🔵 TrivialJustify broad exception handlers with targeted comments.
The three
except Exceptionhandlers at lines 91, 340, and 378 are defensible as intentional robustness boundaries. Add# noqa: BLE001comments with brief justifications:
- Line 91: Best-effort fallback serialization (
json.dumps→str()).- Line 340: Defensive enumeration of CLI chats; any error is logged and enumeration is skipped.
- Line 378: Defensive iteration of CLI sessions; any error is logged and session is skipped.
Each already logs the exception with
exc_info=Truefor observability, making them suitable for broad catches in boundary functions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/export_engine.py` around lines 91 - 92, Add `# noqa: BLE001` comments to justify the three broad `except Exception` handlers in the export_engine.py file. For the handler around line 91 that provides best-effort fallback serialization using str() when json.dumps fails, add a comment explaining this is an intentional robustness boundary for serialization. For the handler around line 340 that defensively enumerates CLI chats with logging, add a comment explaining any error is logged and enumeration is skipped. For the handler around line 378 that defensively iterates over CLI sessions with logging, add a comment explaining any error is logged and the session is skipped. Each comment should be brief and positioned on the same line or immediately before the except Exception clause to acknowledge that these broad catches are intentional and already provide observability through exc_info=True logging.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/export.py`:
- Around line 177-181: The code at line 180 assumes the `lastExportTime` value
is always a string and calls `.replace()` directly on it, but it could be a
numeric timestamp or other format in a valid JSON state file, causing a crash.
Replace the manual datetime parsing and string manipulation (the `.replace()`
call and `datetime.fromisoformat()` logic) with the existing `to_epoch_ms()`
helper function, which safely handles various timestamp formats and variants.
Pass the raw `ts` value directly to `to_epoch_ms()` instead of trying to parse
it as a string first.
In `@services/export_engine.py`:
- Around line 217-227: The current code only catches JSON parsing errors but
doesn't validate the structure of successfully parsed data. After the try-except
block for json.loads in the composerData processing, add type validation to
ensure that cd is actually a dictionary before calling cd.get, and validate that
headers extracted from cd.get("fullConversationHeadersOnly") is a list of valid
dictionaries before accessing h.get on individual items. If cd is not a dict, or
if headers contains non-dict items, log as corrupt and continue to the next row,
matching the defensive pattern used elsewhere. Apply the same validation pattern
to the similar code section also mentioned around lines 272-274.
- Around line 405-413: The tool_call_texts list comprehension is extracting
tc.get("input") and tc.get("summary") values which can be non-string objects,
causing the subsequent "\n\n".join() call to fail. Normalize both the input and
summary fields by wrapping them with json_dump_safe before they are added to the
tool_call_texts list, ensuring all values are properly serialized to strings
before the join operation.
In `@services/workspace_listing.py`:
- Around line 93-104: The prepare_workspace_orchestration call on line 93 is
being passed the raw nocache parameter, but it should be passed the effective
no-cache flag that accounts for both the request parameter and the
CURSOR_CHAT_BROWSER_NOCACHE environment variable. Calculate the effective
no-cache flag by calling nocache_enabled(request_nocache=nocache) before
invoking prepare_workspace_orchestration, and then pass this effective flag
value to prepare_workspace_orchestration instead of the raw nocache parameter.
This ensures the orchestration respects the same no-cache logic that is applied
to the cache operations later in the function.
---
Nitpick comments:
In `@services/export_engine.py`:
- Around line 91-92: Add `# noqa: BLE001` comments to justify the three broad
`except Exception` handlers in the export_engine.py file. For the handler around
line 91 that provides best-effort fallback serialization using str() when
json.dumps fails, add a comment explaining this is an intentional robustness
boundary for serialization. For the handler around line 340 that defensively
enumerates CLI chats with logging, add a comment explaining any error is logged
and enumeration is skipped. For the handler around line 378 that defensively
iterates over CLI sessions with logging, add a comment explaining any error is
logged and the session is skipped. Each comment should be brief and positioned
on the same line or immediately before the except Exception clause to
acknowledge that these broad catches are intentional and already provide
observability through exc_info=True logging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2b6a37b1-08cf-4298-bc5c-080a012d1e3d
📒 Files selected for processing (4)
pyproject.tomlscripts/export.pyservices/export_engine.pyservices/workspace_listing.py
💤 Files with no reviewable changes (1)
- pyproject.toml
Use to_epoch_ms for lastExportTime parsing, validate composerData shape, serialize CLI tool-call fields safely, and pass effective nocache flag through workspace listing orchestration.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@api/export_api.py`:
- Around line 97-99: The code assumes `body` is a dictionary when accessing
`body.get("since")`, but if a client sends a JSON array or string instead of a
JSON object, this will raise an AttributeError and return a 500 error. Before
attempting to read `body.get("since")`, add a validation check to ensure `body`
is actually a dictionary. If it is not a dictionary, return an appropriate 400
Bad Request error response instead of attempting to process it further. This
validation should occur before the line where the `since` variable is assigned.
In `@tests/test_export_engine.py`:
- Around line 33-37: Replace all hardcoded /tmp directory paths like "/tmp/ws"
and "/tmp/out" with dynamically generated temporary directory paths using the
tempfile module. Import tempfile at the top of the test file, then use
tempfile.TemporaryDirectory() or tempfile.gettempdir() to generate safe
temporary paths for the workspace_path and out_dir parameters that are passed to
the function being tested. This applies to all occurrences mentioned in the test
file including lines 33-37, 53, and 83-87 where these hardcoded paths appear.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6a032144-b2ea-4385-82c2-f3cbb10c734d
📒 Files selected for processing (7)
api/export_api.pyscripts/export.pyservices/export_engine.pytests/test_api_export.pytests/test_export_engine.pytests/test_workspace_path_thread_safety.pyutils/workspace_path.py
🚧 Files skipped from review as they are similar to previous changes (2)
- services/export_engine.py
- scripts/export.py
…_ms, engine tests
There was a problem hiding this comment.
♻️ Duplicate comments (1)
services/export_engine.py (1)
256-260: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winKeep IDE timestamp semantics aligned with the no-behavior-change contract.
Line 259 explicitly says this fallback changes timestamps, filenames, and
--since last, which conflicts with the PR objective to preserve CLI export output. Either restore the legacylastUpdatedAt-only behavior here or update the acceptance criteria/release notes to call out the intentional output change.Proposed fix to preserve legacy timestamp behavior
- updated_at = to_epoch_ms(cd.get("lastUpdatedAt")) or to_epoch_ms( - cd.get("createdAt"), - ) - # Intentional behavior change vs legacy CLI: fall back to createdAt when - # lastUpdatedAt is absent (affects timestamps, filenames, and --since last). + updated_at = to_epoch_ms(cd.get("lastUpdatedAt"))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/export_engine.py` around lines 256 - 260, The updated_at assignment includes a fallback from lastUpdatedAt to createdAt which contradicts the stated no-behavior-change objective of the PR. To preserve legacy CLI behavior, remove the fallback logic in the updated_at assignment so it uses only lastUpdatedAt without falling back to createdAt. This will ensure that timestamps, filenames, and --since last operations maintain the same behavior as the original CLI implementation.
🧹 Nitpick comments (1)
scripts/export.py (1)
181-183: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winCreate
state_dirbefore all state-directory writes.
global_manifest_pathis written before the explicitos.makedirs(state_dir, exist_ok=True). Move that creation immediately after Line 181 so fresh--no-zipruns do not depend onwrite_manifest_entries()creating parents.♻️ Proposed fix
state_dir = get_global_state_dir() + os.makedirs(state_dir, exist_ok=True) state_path = os.path.join(state_dir, "export_state.json") last_export = read_last_export_ms(since, state_path=state_path) @@ - os.makedirs(state_dir, exist_ok=True) with open(os.path.join(state_dir, "export_state.json"), "w", encoding="utf-8") as f: json.dump(state, f, indent=2)Also applies to: 233-248, 256-256
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/export.py` around lines 181 - 183, The state directory is not created before files are written to it. Move the os.makedirs(state_dir, exist_ok=True) call to immediately after the state_dir assignment from get_global_state_dir() at line 181, before any writes to state_dir such as global_manifest_path. Apply this same fix to all other locations where state_dir writes occur (around lines 233-248 and 256) to ensure the directory exists before any file operations attempt to use it.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@services/export_engine.py`:
- Around line 256-260: The updated_at assignment includes a fallback from
lastUpdatedAt to createdAt which contradicts the stated no-behavior-change
objective of the PR. To preserve legacy CLI behavior, remove the fallback logic
in the updated_at assignment so it uses only lastUpdatedAt without falling back
to createdAt. This will ensure that timestamps, filenames, and --since last
operations maintain the same behavior as the original CLI implementation.
---
Nitpick comments:
In `@scripts/export.py`:
- Around line 181-183: The state directory is not created before files are
written to it. Move the os.makedirs(state_dir, exist_ok=True) call to
immediately after the state_dir assignment from get_global_state_dir() at line
181, before any writes to state_dir such as global_manifest_path. Apply this
same fix to all other locations where state_dir writes occur (around lines
233-248 and 256) to ensure the directory exists before any file operations
attempt to use it.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1e74ccea-2cd0-4707-b79c-556057db1761
📒 Files selected for processing (6)
api/export_api.pymodels/__init__.pymodels/export.pyscripts/export.pyservices/export_engine.pytests/test_export_engine.py
🚧 Files skipped from review as they are similar to previous changes (1)
- api/export_api.py
|
@CodeRabbit, @timon0305 's comments are addressed now? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
REPO="cppalliance/cppa-cursor-browser"
PR=112
echo "=== PR Review Comments by timon0305 ==="
gh api "repos/$REPO/pulls/$PR/comments" \
--paginate \
--jq '.[] | select(.user.login == "timon0305") | {
id,
path: .path,
line: (.line // .original_line),
body,
created_at,
updated_at,
url: .html_url
}'
echo ""
echo "=== Issue Comments by timon0305 ==="
gh api "repos/$REPO/issues/$PR/comments" \
--paginate \
--jq '.[] | select(.user.login == "timon0305") | {
id,
body,
created_at,
updated_at,
url: .html_url
}'
echo ""
echo "=== PR Reviews by timon0305 ==="
gh api "repos/$REPO/pulls/$PR/reviews" \
--paginate \
--jq '.[] | select(.user.login == "timon0305") | {
id,
state,
body,
submitted_at,
url: .html_url
}'Length of output: 3768 🏁 Script executed: #!/bin/bash
set -euo pipefail
REPO="cppalliance/cppa-cursor-browser"
PR=112
echo "=== All PR Review Comments (with resolution status) ==="
gh api "repos/$REPO/pulls/$PR/comments" \
--paginate \
--jq '.[] | {
id,
in_reply_to_id,
user: .user.login,
path: .path,
line: (.line // .original_line),
body: (.body | .[0:200]),
created_at,
url: .html_url
}'
echo ""
echo "=== Resolved review threads via GraphQL ==="
gh api graphql -F owner="cppalliance" -F name="cppa-cursor-browser" -F pr=112 -f query='
query($owner: String!, $name: String!, $pr: Int!) {
repository(owner: $owner, name: $name) {
pullRequest(number: $pr) {
reviewThreads(first: 50) {
nodes {
id
isResolved
path
line
comments(first: 3) {
nodes {
author { login }
body
createdAt
}
}
}
}
}
}
}' --jq '.data.repository.pullRequest.reviewThreads.nodes[] | {
isResolved,
path: .path,
line: .line,
first_comment_author: .comments.nodes[0].author.login,
first_comment_body: (.comments.nodes[0].body | .[0:150]),
reply_count: (.comments.nodes | length)
}'Length of output: 10575
Remaining actions before
|
rel_path nit (line ~328) — already fixed No further action needed. The “status unclear” note is wrong — this was fixed in the earlier #112 review round. createdAt fallback — resolved the other way |
Closes #106
Summary
Finishes the export-script consolidation started in #42 / PR #61 by extracting shared orchestration into
services/export_engine.py, reducingscripts/export.pyto a thin CLI wrapper, and routing the CLI through the same cached workspace-resolution path as the web layer (summary cache + mtime fingerprinting). Removes thescripts.exportmypy strict override.Continues cppa-cursor-browser #42. Companion PR (Monday #4):
os.environworkspace-path bypass fix — land after this so explicit-override plumbing slots into the shared engine.Changes
services/export_engine.py— shared orchestration:prepare_workspace_orchestration()— workspace scan, mtime fingerprint,resolve_workspace_context_cachedload_global_db_export_data()— global DB maps, composer rows, invalid-workspace aliasescollect_export_entries()— IDE + CLI conversation collection, exclusion filtering, markdown generationscripts/export.py— reduced to arg parsing, manifest/state I/O, zip/file output; delegates orchestration tocollect_export_entries()services/workspace_listing.py— usesprepare_workspace_orchestration()instead of duplicating fingerprint + context setuppyproject.toml— removed[[tool.mypy.overrides]]ignore_errors = trueforscripts.exportOut of scope
os.environ["WORKSPACE_PATH"]mutation for--base-dir(Monday PR 2, issue Fix app.py and scripts/export.py CLI: switch to argparse, add --port/--host/--base-dir, fix default host, validate --since #4)Acceptance criteria
scripts/export.pycalls shared export orchestration instead of reimplementing workspace scanning, DB reading, and composer processingscripts.exportmypy override removedscripts/export.pypassesmypy --strictTest plan
mypy --strict scripts/export.py services/export_engine.py services/workspace_listing.pypytest tests/test_cli_export_e2e.py tests/test_export_exclusion_filtering.py tests/test_cli_args.py tests/test_api_export.py -qpytest tests/test_api_workspaces.py tests/test_workspace_listing_*.py -qpython scripts/export.py --helpReferences
os.environworkspace path bypass (Fix app.py and scripts/export.py CLI: switch to argparse, add --port/--host/--base-dir, fix default host, validate --since #4)Summary by CodeRabbit
since=lastcutoff via export state, and build zip contents from returned entries.